38 research outputs found

    Crawling in Rogue's dungeons with (partitioned) A3C

    Full text link
    Rogue is a famous dungeon-crawling video-game of the 80ies, the ancestor of its gender. Rogue-like games are known for the necessity to explore partially observable and always different randomly-generated labyrinths, preventing any form of level replay. As such, they serve as a very natural and challenging task for reinforcement learning, requiring the acquisition of complex, non-reactive behaviors involving memory and planning. In this article we show how, exploiting a version of A3C partitioned on different situations, the agent is able to reach the stairs and descend to the next level in 98% of cases.Comment: Accepted at the Fourth International Conference on Machine Learning, Optimization, and Data Science (LOD 2018

    Identifying Critical States by the Action-Based Variance of Expected Return

    Full text link
    The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.Comment: 12 pages, 6 figure

    C-tests revisited: back and forth with complexity

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-21365-1_28We explore the aggregation of tasks by weighting them using a difficulty function that depends on the complexity of the (acceptable) policy for the task (instead of a universal distribution over tasks or an adaptive test). The resulting aggregations and decompositions are (now retrospectively) seen as the natural (and trivial) interactive generalisation of the C-tests.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). C-tests revisited: back and forth with complexity. En Artificial General Intelligence 8th International Conference, AGI 2015, AGI 2015, Berlin, Germany, July 22-25, 2015, Proceedings. Springer International Publishing. 272-282. https://doi.org/10.1007/978-3-319-21365-1_28S272282Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: Computational measures of information gain and reinforcement in inference processes. AI Communications 13(1), 49–50 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg (2000)Hernández-Orallo, J.: AI evaluation: past, present and future (2014). arXiv preprint arXiv:1408.6908Hernández-Orallo, J.: On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems, 1–53 (2014). http://dx.doi.org/10.1007/s10458-014-9257-1Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research 27, 50–74 (2014)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl. Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Hibbard, B.: Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence 1(1), 54–61 (2009)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3 edn. Springer-Verlag (2008)Schaul, T.: An extensible description language for video games. IEEE Transactions on Computational Intelligence and AI in Games PP(99), 1–1 (2014)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964

    Deep Reinforcement Learning: An Overview

    Full text link
    In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

    A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

    Full text link
    Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page: https://unnat.github.io/cordial-syn

    Online Continual Learning on Sequences

    Full text link
    Online continual learning (OCL) refers to the ability of a system to learn over time from a continuous stream of data without having to revisit previously encountered training samples. Learning continually in a single data pass is crucial for agents and robots operating in changing environments and required to acquire, fine-tune, and transfer increasingly complex representations from non-i.i.d. input distributions. Machine learning models that address OCL must alleviate \textit{catastrophic forgetting} in which hidden representations are disrupted or completely overwritten when learning from streams of novel input. In this chapter, we summarize and discuss recent deep learning models that address OCL on sequential input through the use (and combination) of synaptic regularization, structural plasticity, and experience replay. Different implementations of replay have been proposed that alleviate catastrophic forgetting in connectionists architectures via the re-occurrence of (latent representations of) input sequences and that functionally resemble mechanisms of hippocampal replay in the mammalian brain. Empirical evidence shows that architectures endowed with experience replay typically outperform architectures without in (online) incremental learning tasks.Comment: L. Oneto et al. (eds.), Recent Trends in Learning From Data, Studies in Computational Intelligence 89

    Increasing generality in machine learning through procedural content generation

    Get PDF
    Procedural Content Generation (PCG) refers to the practice, in videogames and other games, of generating content such as levels, quests, or characters algorithmically. Motivated by the need to make games replayable, as well as to reduce authoring burden, limit storage space requirements, and enable particular aesthetics, a large number of PCG methods have been devised by game developers. Additionally, researchers have explored adapting methods from machine learning, optimization, and constraint solving to PCG problems. Games have been widely used in AI research since the inception of the field, and in recent years have been used to develop and benchmark new machine learning algorithms. Through this practice, it has become more apparent that these algorithms are susceptible to overfitting. Often, an algorithm will not learn a general policy, but instead a policy that will only work for a particular version of a particular task with particular initial parameters. In response, researchers have begun exploring randomization of problem parameters to counteract such overfitting and to allow trained policies to more easily transfer from one environment to another, such as from a simulated robot to a robot in the real world. Here we review the large amount of existing work on PCG, which we believe has an important role to play in increasing the generality of machine learning methods. The main goal here is to present RL/AI with new tools from the PCG toolbox, and its secondary goal is to explain to game developers and researchers a way in which their work is relevant to AI research

    Trappin-2/Elafin Modulate Innate Immune Responses of Human Endometrial Epithelial Cells to PolyI∶C

    Get PDF
    BACKGROUND: Upon viral recognition, innate and adaptive antiviral immune responses are initiated by genital epithelial cells (ECs) to eradicate or contain viral infection. Such responses, however, are often accompanied by inflammation that contributes to acquisition and progression of sexually transmitted infections (STIs). Hence, interventions/factors enhancing antiviral protection while reducing inflammation may prove beneficial in controlling the spread of STIs. Serine antiprotease trappin-2 (Tr) and its cleaved form, elafin (E), are alarm antimicrobials secreted by multiple cells, including genital epithelia. METHODOLOGY AND PRINCIPAL FINDINGS: We investigated whether and how each Tr and E (Tr/E) contribute to antiviral defenses against a synthetic mimic of viral dsRNA, polyinosine-polycytidylic acid (polyI:C) and vesicular stomatitis virus. We show that delivery of a replication-deficient adenovector expressing Tr gene (Ad/Tr) to human endometrial epithelial cells, HEC-1A, resulted in secretion of functional Tr, whereas both Tr/E were detected in response to polyI:C. Moreover, Tr/E were found to significantly reduce viral replication by either acting directly on virus or through enhancing polyI:C-driven antiviral protection. The latter was associated with reduced levels of pro-inflammatory factors IL-8, IL-6, TNFα, lowered expression of RIG-I, MDA5 and attenuated NF-κB activation. Interestingly, enhanced polyI:C-driven antiviral protection of HEC-Ad/Tr cells was partially mediated through IRF3 activation, but not associated with higher induction of IFNβ, suggesting multiple antiviral mechanisms of Tr/E and the involvement of alternative factors or pathways. CONCLUSIONS AND SIGNIFICANCE: This is the first evidence of both Tr/E altering viral binding/entry, innate recognition and mounting of antiviral and inflammatory responses in genital ECs that could have significant implications for homeostasis of the female genital tract
    corecore